Pillar of Informatics
Pillar VII
To improve healthcare efficiency and to assess the effectiveness of implemented protocols, interventions, and newly introduced methods, it is essential to analyze the vast amount of data generated during care and to first transform it into a processable format. The fundamental goal of the National Laboratory for Translational Neuroscience (TINL) is to establish a data-driven healthcare system in the treatment and research of neurological diseases, which requires the development of data recording methods.
The planning and development of an IT solution in line with the TINL concept involves the following steps:
- Designing a data collection system (specifying forms)
- Aligning data with international standards
- Developing data collection solutions (within the e-Medsolution system and through a custom form system)
- Creating a standalone TINL data structure and databases suitable for analysis (own infrastructure at the University of Pécs)
- Designing and developing data loading into a central database (own infrastructure at the University of Pécs)
- Serving analytical needs
To achieve the project goals, the following IT developments will be implemented:
- Establishing structured data collection for healthcare data within the e-Medsolution system by developing and configuring specialty-specific forms (based on specifications prepared by professional pillars).
- Developing solutions to support the collection of patient-reported data (Patient Reported Outcome) in specific disease areas (movement disorders and spinal surgery) by configuring and utilizing the Limesurvey system for this purpose.
- Designing and developing a central data platform that supports the integration and analysis of data from various sources, along with procedures that ensure data transformation and compliance with data security requirements.
- Supporting research and analysis.
- Using solutions and technologies that ensure data protection.
1. Developing Structured Data Collection Forms in the e-Medsolution System
Currently, the databases created by Hungarian patient information systems are largely unstructured, containing free-text entries that are not digitally processable. To improve healthcare practices (protocols, interventions, etc.) and perform health economic analyses, it is necessary to manually create registries to generate structured databases.
Genuine structured patient information data (EHRs), including measured values (e.g., laboratory, imaging results, as well as histological and biomarker tests), provide detailed information about patient characteristics, reasons for therapy changes or interruptions, disease stages, and side effects. These data are crucial for comparative effectiveness assessments and research. Structured EHR data reflect real clinical practice, providing evidence on how healthcare technologies perform in actual populations, which may significantly differ from the results of clinical trials based on registries. This has substantial national economic significance, as analyzing structured data can reveal the real-world effectiveness of treatments.
Within the framework of the 7th pillar, our work focuses on establishing an IT infrastructure capable of structured data collection in healthcare. In the first phase, we created structured forms for the most important public health neurological diseases and developed the form creation technique.
For numerous conditions (stroke, hemorrhagic stroke, movement disorders, psychosis, epilepsy, spinal surgery, traumatic brain injuries, multiple sclerosis, myasthenia gravis), we have developed specific forms that align with both international and domestic medical guidelines and protocols and meet international data standard requirements to a high degree. These forms have been integrated into the e-Medsolution system, supporting patient care, and in several diseases, structured data collection is already being implemented in "live clinical practice" at the Clinical Center of the University of Pécs. This allows the collected data to be utilized in clinical and health economic research.
The figure shows the top level of the stroke-specific form structure developed, using several hundred fields and recorded variables as an example.
Within the framework of TINL, the forms developed have already been adopted in patient care not only by the University of Pécs but also by several of our partners. This creates a solid foundation for close collaboration in research and analysis. We hope that our data collection system, developed for patient care, will also serve as a guiding model for other areas of healthcare.
In a parallel development phase, we have initiated the creation of solutions aimed at verifying the completeness and quality of form filling and improving the use of forms in collaboration with the Health Informatics Service and Development Center (ESZFK). Complemented by this development and functional expansion, the e-Medsolution system will be capable of long-term application, including configuring structured forms, filling them out during patient care, and verifying the completeness of form entries.
1. Patient-Completed Questionnaires
In several disease areas, there is a common need for patients to answer predefined questions as part of the general protocol of the patient care process through personal questionnaires. To address this, the TINL project has initially developed structured (non-free text) questionnaires in the fields of movement disorders and spinal surgery. These questionnaires are designed to assess the current condition of patients based on their own perception. The development of the questionnaires was carried out using the Limesurvey system.
Questionnaires can be completed during a consultation with a physician (using a tablet) or at home in response to an email notification. The content of the questionnaires remains the same in both methods. Through the practical use of these questionnaires in patient care, we have collected hundreds of data entries related to patients' self-assessment of their condition. In further processing, these responses are interpreted as result (report) data.
The scoring system associated with the answers provides an aggregated result, allowing conclusions to be drawn regarding the patient's condition.
2. Design and Development of the Central Data Platform
The system processes data from source systems through multiple phases and layers, ultimately making it accessible and suitable for analysis by users. This process is schematically illustrated in the following figure.
From the e-Medsolution system, patient care data are extracted in the HL7 format, which is standard in the healthcare field. The Limesurvey system (patient-completed questionnaire data) generates the export file after the questionnaire is finalized.
First Processing Phase: Depersonalization Service
The first processing phase is performed by a depersonalization service running in the environment of the source system. The goal of this service is to perform transformations on the raw input data so that the data leaving the source system environment are already pseudonymized.
The resolution table of pseudonymized identifiers (containing pairs of original and encoded identifiers) is stored in a system physically separated from the analysis database (xRef database).
Data Storage and Transfer
The incoming (pseudonymized) data are temporarily stored in the staging layer, awaiting regular transfer to the comprehensive data warehouse (dwh) that contains the full dataset.
The structure of the central database (dwh) is illustrated in the following diagram. The entity diagram clearly shows that the data in the central database and data platform are available in a structure that supports both the logic of patient care and more complex research and analysis. The structure highlights the following data categories: General patient data, Data on doctor-patient encounters, Diagnostic data, Data related to outpatient and inpatient care and interventions, Characteristics of medical reporting and documentation
The central database will be loaded with patient care data available in the e-Medsolution system from 2014 to 2024, as authorized by the TUKEB approval within the TINL project. In the near future, the data platform will also include patient-completed questionnaire data, as well as data managed in previous databases related to stroke and movement disorder disease areas.
1. Supporting Research and Analysis
To serve the specific objectives of research, data marts containing a narrowed data set and the OMOP database, complying with international healthcare data standards, will be periodically generated from the data warehouse (dwh) layer. This will occur either on a regular basis (typically for the OMOP layer) or according to research needs (typically for data marts).
The characteristic of the OMOP-compliant database layer is that it contains patient care data that have a corresponding OMOP representation, and the data in this layer are available in pseudonymized form. The Observational Health Data Sciences and Informatics (OHDSI) initiative, established in 2014, created this data structure and dictionary, which is used as a unified healthcare database by more than 2,000 affiliated institutions in 74 countries. Currently, participating partners store health data from more than 800 million patients in this structure.
The primary characteristic of data marts is that they contain patient data relevant to specific research projects, for which ethical approval has been obtained. These data marts include both patient data that meet the inclusion criteria of the research and control group patient care data defined by specified parameters.
Tools for accessing data for research purposes include special software developed for the OMOP database (e.g., Atlas), visualization tools suitable for overview and descriptive analysis (Apache Superset), and developer environments capable of running data analysis programs and procedure collections (RStudio, JupyterLab). For data cataloging and professional interpretation, we use the OpenMetadata system.
Researchers, as end users of the system, access the web interfaces of the provided tools by logging into a dedicated VPN network.
2. Solutions Ensuring Data Protection
The protection of healthcare data is ensured by the following technological solutions:
- There is no human access to HL7 files containing personal identifiers (automated process).
- Data processing and transformation operate in an isolated environment, requiring VPN access.
- All data communication channels within the system are encrypted, and system and data access rules restrict data availability.
- The data on the platform are pseudonymized.
- Processing of HL7 files is logged, and only in the case of an error does the log contain data from the processed message.
- Processed HL7 files are deleted after processing.
- A translator table used for re-identifying pseudonymized data has restricted access.
- Researchers and analysts can only access data stored in the various layers of the central data platform, according to their authorization level.